This uses pointblank to create a data validation report. In the resulting table at the end, any failing tests should have a CSV button that lets you download a .csv file of just the rows of data that don’t pass that particular validation step.

Check missing values

Action levels

By default, warn if 1 or more rows fail conditions and error if 2% or more fail. Some checks are run with a stricter action level that errors if any rows fail.

al_default <-  action_levels(warn_at = 1, stop_at = 0.02) #warn if even row fails, error if 2% of rows fail
al_strict <- action_levels(stop_at = 1) #error if even one row fails

Data Validation

The two datasets being submitted with the data paper are HDP_plots.csv and HDP_1997_2009.csv

Checks for data type, range, and duplicates

Pointblank Validation
Data Validation

tibbleWARN 1 STOP 0.02 NOTIFY
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W S N EXT
1
col_vals_in_set
 col_vals_in_set()

subplot

A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, B1, B2, B3, B4, B5, B6, B7, B8, B9, B10, C1, C2, C3, C4, C5, C6, C7, C8, C9, C10, D1, D2, D3, D4, D5, D6, D7, D8, D9, D10, E1, E2, E3, E4, E5, E6, E7, E8, E9, E10, F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, G1, G2, G3, G4, G5, G6, G7, G8, G9, G10, H1, H2, H3, H4, H5, H6, H7, H8, H9, H10, I1, I2, I3, I4, I5, I6, I7, I8, I9, I10, J1, J2, J3, J4, J5, J6, J7, J8, J9, J10

67K 67K
1.00
0
0.00

2
col_vals_in_set
 col_vals_in_set()

plot_id

CF-1, CF-2, CF-3, CF-4, CF-5, CF-6, FF-1, FF-2, FF-3, FF-4, FF-5, FF-6, FF-7

67K 67K
1.00
0
0.00

3
col_vals_expr

Height is measured to nearest cm

col_vals_expr()

ht%%1 == 0

57K 57K
1.00
0
0.00

4
col_vals_expr

Shoots is interger

col_vals_expr()

shts%%1 == 0

57K 57K
1.00
0
0.00

5
col_vals_expr

Number of inflorescences is integer

col_vals_expr()

infl%%1 == 0

2K 2K
1.00
0
0.00

6
col_vals_between

shoots between 0 and 20

col_vals_between()

shts

[0, 20]

67K 67K
0.99
8
0.01

7
col_vals_between

height between 0 and 200cm

col_vals_between()

ht

[0, 200]

67K 67K
0.99
2
0.01

8
col_vals_between

infloresences between 0 and 3

col_vals_between()

infl

[0, 3]

67K 67K
0.99
15
0.01

9
rows_distinct

duplicated rows

rows_distinct()

67K 67K
1.00
0
0.00

10
col_vals_not_null
 col_vals_not_null()

plant_id

67K 67K
1.00
0
0.00

11
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

3K 3K
1.00
0
0.00

12
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

4K 4K
1.00
0
0.00

13
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

5K 5K
1.00
0
0.00

14
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

6K 6K
1.00
0
0.00

15
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

6K 6K
1.00
0
0.00

16
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

6K 6K
1.00
0
0.00

17
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

6K 6K
1.00
0
0.00

18
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

6K 6K
1.00
0
0.00

19
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

7K 7K
1.00
0
0.00

20
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

5K 5K
1.00
0
0.00

21
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

6K 6K
1.00
0
0.00

22
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

6K 6K
1.00
0
0.00

2023-05-25 22:10:41 UTC 4.6 s 2023-05-25 22:10:45 UTC

Year to year change

Checks that year to year change in size is reasonable

Pointblank Validation
Check growth & regression

tibbleWARN 1 STOP 0.02 NOTIFY
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W S N EXT
1
col_vals_lt

|% change in height| < 200%

col_vals_lt()

ht_pc

2

67K 66K
0.99
420
0.01

2
col_vals_between

|∆ height| < 100cm

col_vals_between()

ht_diff

[−100, 100]

67K 67K
0.99
11
0.01

3
col_vals_between

|∆ shoot number| < 5

col_vals_between()

shts_diff

[−5, 5]

67K 67K
0.99
201
0.01

2023-05-25 22:10:48 UTC < 1 s 2023-05-25 22:10:48 UTC

Seedlings

Check that size of seedlings is reasonable

Pointblank Validation
Check seedlings

tibbleWARN 1 STOP 0.02 NOTIFY
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W S N EXT
1
col_vals_lt

shoots < 3

col_vals_lt()

shts

3

3K 3K
0.99
12
0.01

2
col_vals_lt

height < 30cm

col_vals_lt()

ht

30

3K 3K
0.99
3
0.01

2023-05-25 22:10:49 UTC < 1 s 2023-05-25 22:10:49 UTC